Skip to content

Conversation

benchaplin
Copy link
Contributor

There's logic in AbstractInternalTerms that sets the top-level doc_count_error_upper_bound to 0 if only one aggregation is being reduced. However, now that we have batched query execution, it's possible a reduction of multiple aggregations has already occurred on a data node, and is now undergoing a final reduction on the coordinating node.

I discovered this case while attempting to remove the settings override turning off batched query execution in TermsDocCountErrorIT (testFixedDocs with -Dtests.seed=ABFC03388645940D):

  • 3 shards on a data node, 0 shards on the coordinating node
  • batched request performs partial reductions on the data node, calculating the expected value of doc_count_error_upper_bound = 46
  • coordinating node attempts the final reduction, but since its reducing only one agg, sets the doc_count_error_upper_bound to 0

I've attempted a lightweight fix here: flag the existence of a batched query result in the AggregationReduceContext, then check this flag when potentially setting doc_count_error_upper_bound to 0. If a batched query result is present, it implies multiple shards were reduced remotely (a single shard on a data node is not batched).

@benchaplin benchaplin added >non-issue :Analytics/Aggregations Aggregations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch :Search Foundations/Search Catch all for Search Foundations v9.2.0 v9.1.5 labels Sep 12, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-foundations (Team:Search Foundations)

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic makes sense to me, I left one question and some nits regarding how to expose the new flag.


protected abstract AggregationReduceContext forSubAgg(AggregationBuilder sub);

public boolean doesFinalReduceHaveBatchedResult() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: that feels weird since it depends on isFinalReduce? Maybe rename it into hasBatchedResult so that callers have to check isFinalReduce and hasBatchedResult?

return finalReduceHasBatchedResult;
}

public void setFinalReduceHasBatchedResult(boolean finalReduceHasBatchedResult) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be final and set in the ForFinal ctr?

// If we are reducing only one aggregation (size == 1), the doc count error should be 0.
// However, the presence of a batched query result implies this is a final reduction and a partial reduction with size > 1
// has already occurred on a data node. The doc count error should not be 0 in this case.
docCountError = size == 1 && reduceContext.doesFinalReduceHaveBatchedResult() == false ? 0 : sumDocCountError;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that also handle the case where the partial reduction happens on the coord node (when reaching reduce batch size)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations >non-issue :Search Foundations/Search Catch all for Search Foundations Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Foundations Meta label for the Search Foundations team in Elasticsearch v9.1.5 v9.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants